Genetic Optimization of Keywords Subset in the Classification Analysis of Texts Authorship

نویسنده

  • Bohdan Pavlyshenko
چکیده

The genetic selection of keywords set, the text frequencies of which are considered as attributes in text classification analysis, has been analyzed. The genetic optimization was performed on a set of words, which is the fraction of the frequency dictionary with given frequency limits. The frequency dictionary was formed on the basis of analyzed text array of texts of English fiction. As the fitness function which is minimized by the genetic algorithm, the error of nearest k neighbors classifier was used. The obtained results show high precision and recall of texts classification by authorship categories on the basis of attributes of keywords set which were selected by the genetic algorithm from the frequency dictionary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تعیین ماشین‌های بردار پشتیبان بهینه در طبقه‌بندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک

Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

OPTIMIZATION OF SKELETAL STRUCTURES USING IMPROVED GENETIC ALGORITHM BASED ON PROPOSED SAMPLING SEARCH SPACE IDEA

In this article, by Partitioning of designing space, optimization speed is tried to be increased by GA. To this end, designing space search is done in two steps which are global search and local search. To achieve this goal, according to meshing in FEM, firstly, the list of sections is divided to specific subsets. Then, intermediate member of each subset, as representative of subset, is defined...

متن کامل

ارائه روشی برای استخراج کلمات کلیدی و وزن‌دهی کلمات برای بهبود طبقه‌بندی متون فارسی

Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...

متن کامل

SELECTION OF SUITABLE RECORDS FOR NONLINEAR ANALYSIS USING GENETIC ALGORITHM (GA) AND PARTICLE SWARM OPTIMIZATION (PSO)

This paper presents a suitable and quick way to choose earthquake records in non-linear dynamic analysis using optimization methods. In addition, these earthquake records are scaled. Therefore, structural responses of three different soil-frame models were examined, the change in maximum displacement of roof was analyzed and the damage index of whole structures was measured. The soil classifica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1211.3402  شماره 

صفحات  -

تاریخ انتشار 2012